A Methodology for Developing Multilingual Resources for Terminology
نویسندگان
چکیده
This paper presents a project that aims at building lexical resources for terminology. By lexical resources, we mean dictionaries that provide detailed lexico-semantic information on terms, i.e. lexical units the sense of which can be related to a special subject field. In terminology, there is a lack of such resources. The specific dictionaries we are currently developing describe basic French and Korean terms that belong to the fields of computer science and the Internet (e.g. computer, configure, user-friendly, Web, browse, spam). This paper presents the structure of the French and Korean articles: each component is examined and illustrated with examples. We then describe the corpus-based methodology and the different computer applications used for developing the articles. Our methodology comprises five steps: design of the corpora, selection of terms; sense distinction; definition of actantial structures and listing of semantic relations. Details on the current state of each database are also given. 1. Background and Motivations This paper presents a project that aims at building lexical resources for terminology. By lexical resources, we mean dictionaries that provide detailed lexicosemantic information on terms, i.e. lexical units the sense of which can be related to a special subject field. In terminology, there is a lack of such resources since, typically, terminological dictionaries (and even more recent resources such as ontologies) focus on the knowledge structure of specialized subject fields, thereby ignoring important linguistic properties of terms. Furthermore, we cannot entirely rely on general resources (such as WordNet or other general dictionaries in electronic form), even if they have a good coverage of terms, since they do not always capture subtle semantic distinctions that appear in specific fields of knowledge. The specific dictionaries we are currently developing describe basic French and Korean terms that belong to the fields of computer science and the Internet (e.g. computer, configure, user-friendly, Web, browse, spam). Part of the French dictionary can be accessed on the Internet (DiCoInfo, Dictionnaire fondamental de l’informatique et de l’Internet: http://olst.ling.umontreal.ca/dicoinfo/). Both dictionaries take into account: a) four different parts of speech (nouns, verbs, adjectives and adverbs); b) the polysemy of terms; c) describe their actantial structure in terms of actantial roles; d) list the terms that can fill an actantial position, and, finally, list all the terms that are semantically related to a term being described. Our descriptions are based on Explanatory Combinatorial Lexicology (ECL) (Mel’čuk et al. 1984-1999, 1995). 1 Other details on why this framework is useful for terminology can be found in L’Homme (2002, 2003). Section 2 of the paper presents the structure of the articles contained in the dictionary. Each component is examined and illustrated with French and Korean examples. In section 3, we describe the corpus-based methodology used in both languages. Section 4 gives details on the current state of each database. Finally, we will conclude with a short list of forthcoming projects. 2. Structure of the Dictionary As was said above, the dictionary provides a description of various lexico-semantic properties of terms. More specifically, the articles take into account: A) The polysemy of terms: For example, three different meanings for adresse (address) have been identified. In the dictionary, separate meanings are distinguished with a numbering system. Similarly, the meanings of the Korean form 주소 (address) are disambiguated. adresse 1: ‘address in a storage device’ adresse 2: ‘address of a computer in a network’ adresse 3: ‘address of a user’ (e.g. an email address)’ 주소 1: 기억장치 내의 주소 (adresse 1) 주소 2: 통신망에서 단말기의 주소 (adresse 2) 주소 3: 사용자의 전자메일 주소 (adresse 3) B) The actantial structures of terms: Each separate meaning is accompanied by its actantial structure, which gives the position of actants and explains them in terms of actantial roles. In addition, linguistic realizations of actants are provided. We reproduced below the actantial structure and the the actants of the term naviguer (browse).
منابع مشابه
Multilingual Ontologies and English- Bulgarian Ontology Development
In this paper we make a short survey of the approaches for development of multilingual ontologies. Our main goal is to find appropriate approach for development of multilingual ontologies, including Bulgarian language terminology. We propose a collaborative methodology for development of English-Bulgarian bilingual ontologies by usage of information extraction from e-learning textual content, l...
متن کاملTExtractor: a multilingual terminology extraction tool
This demonstration presents a tool (TExtractor) employed for enriching terminology sets in four languages: English, French, German and Spanish. We present the associated linguistic resources and the experimental results obtained in the medical domain. TExtractor has been developed within project LIQUID (IST-2000-25324), which aims at developing a cost-effective solution for the problem of cross...
متن کاملSemantics-Aware Indexing of Geospatial Resources Based on Multilingual Thesauri: Methodology and Preliminary Results
Despite the structured nature of metadata associated with geospatial resources, the discovery functionality implemented by geoportals is primarily based on the syntactic matching of users’ search pattern against descriptive metadata, such as title, abstract, or keywords. As a consequence, the retrieval process is often hampered by linguistic issues related to multilingualism, semantic heterogen...
متن کاملLanguage Resources Production Models: the Case of the INTERA Multilingual Corpus and Terminology
This paper reports on the multilingual Language Resources (MLRs), i.e. parallel corpora and terminological lexicons for less widely digitally available languages, that have been developed in the INTERA project and the methodology adopted for their production. Special emphasis is given to the reality factors that have influenced the MLRs development approach and their final constitution. Buildin...
متن کاملBilingual terminology extraction: an approach based on a multilingual thesaurus applicable to comparable corpora
This paper presents several methods for exploiting multiple resources in bilingual lexicon extraction, either from parallel or comparable corpora. First, a special attention is given to the use of multilingual thesauri, and different search strategies based on such thesauri are investigated. Then, a method to optimally combine the different resources for bilingual lexicon extraction is presente...
متن کاملMultilingual Corpus Development for Opinion Mining
Opinion Mining is a discipline that has attracted some attention lately. Most of the research in this field has been done for English or Asian languages, due to the lack of resources in other languages. In this paper we describe our methodology for developing a manually annotated multilingual corpus with fine-grained opinion and target annotations. The languages represented in the corpus are En...
متن کامل